Appl. No. 10/689,257 Amdt. dated August 15, 2006 Reply to Office Action of 15 May 2006 Amendments to the Specification: Please replace the current title with the following new title: Method of Shifting Data Along Diagonals in a Group of Processing Elements to Transpose the Data Please replace paragraph [0007] with the following amended paragraph: [0007] In the past, several different methods of connecting PEs have been used in a variety of geometric arrangements including hypercubes, butterfly networks, one-dimensional strings/rings and two-dimensional meshes. In a two-dimensional mesh or arrays array, the PEs are arranged in rows and columns, with each PE being connected to its four neighboring PEs in the rows above and below and columns to either side which are sometimes referred to as north, south, east and west connections. Please replace paragraph [0049] with the following amended paragraph: [0049] Eight host memory access registers (H) may be provided which allows for a short burst of four or eight bytes to be transferred into or out of the DRAM 24 for host access. Those registers may be multiplexed and be visible from the host memory interface 22 (see FIG. 1) as a page of data. More details about the PEs may be found in G.B. Patent Application No. 0221562.2 entitled Host Memory Interface for a Parallel Processor and filed September 17, 2002, which is hereby incorporated by reference. Please replace paragraph [0052] with the following amended paragraph: [0052] At the edges of the array 36, the out-of-array connection is selected though a multiplexer to be either the output from the opposite side of the array or an edge/row register 54 or an edge/col. register 56. The edge registers 54, 56 can be loaded from the array output or from the controller data bus. A data shift in the array can be performed by loading the X register from one of the four neighboring directions. The contents of the X register can be conditionally loaded on - 2 - the AND gate of the row select and column select signals which intersect at each PE. When the contents of the X register is conditionally loaded, the edge registers 54, 56 are also loaded conditionally depending on the value of the select line which runs in the same direction. Hence, an edge/row register 54 is loaded if the column select for that column is set to 1 and an edge/col register 56 is set if the row select is set to 1. The reader desiring more information about the hardware configuration illustrated in FIG. 5 is directed to G.B. Patent Application GB0221563\_0, entitled Control of Processing Elements in Parallel Processors filed September 17, 2002, which is hereby incorporated by reference, now Patent No. GB2395299. Please replace paragraph [0057] with the following amended paragraph: [0057] Returning to FIG. 5, the PE-PE interconnect may also provide a broadcast and broadcatch network. Connections or buses 58 extend north to south from a column select register 59 and connections or buses 60 extend west to east from a row select register 61. Also provided is row broadcast/broadcatch AND chain 62 and a column broadcast/broadcatch AND chain. When used for data broadcast or broadcatch, these connections (column buses 58 and row buses 60) act as if driven by open drain drivers; the value on any bit is the wire-AND of all the drivers outputs. Three control signals (broadcatch, broadcast and intercast) determine the direction of the buses as follows: - If broadcatch is set to 1, any PE for which the corresponding bits of the row select register 61 and column select register 59 are both set will drive both the row buses 60 and the column buses 58. Note that if no PEs in a row or column drive the bus, the edge register at the end of that row or column will be loaded with 0 x FF. - If broadcast is set to 1, the row bus 60 is driven from the row select register 61 and the column bus 58 is driven from the column select register 59 and any PE for which the corresponding bits of the row select register 61 and column select register 59 are both set will be loaded from one of the row or column inputs, according to which is selected. - If intercast is set to 1, any PE in which its A register is 1 will drive its output onto its row bus 60 and column bus 58 and any PE for which the corresponding bits of the row select register 61 and column select register 59 are both set will be loaded from one of the row buses 60 or column buses 58, according to which is selected. Please replace paragraph [0066] with the following amended paragraph: [0066] All X values are passed through the PE; the required output value is conditionally loaded once it has arrived in the PE. The conditional loading can be done in various ways. e.g. by using any PE registers except X, R1, or R2. An example is shown below. | Clock | PE C + 0 | | PE C + 1 | | PE C + 2 | | PE C + 3 | |-------|---------------------------|---|---------------------------|----------|---------------------------|---|---------------------------| | Cycle | | | | | | | | | T + 0 | $X \le xe(east)$ | Œ | X <= xe | <b>=</b> | X <= xe | ₩ | X <= xe | | | Ų | | Ų. | | Ų | | <b>U</b> | | T + 1 | R1 <= X | | R1 <= X | | R1 <= X | | R1 <= X | | | Ų | | <b></b> | | $\downarrow$ | | Ų. | | T + 2 | <cond>?R0 &lt;= R1</cond> | | <cond>?R0 &lt;= R1</cond> | | <cond>?R0 &lt;= R1</cond> | | <cond>?R0 &lt;= R1</cond> | - At time T+0: The X register reads data form from the X register on the PE to the East. This shifts data to the left (or West). - At time T+1: The R1 register unconditionally reads the data off the shift network (X register) - At time T+2: The R0 register conditionally loads the data from R1. (i.e. if <cond>=1).